28 research outputs found

    A Neural Network Architecture for Figure-ground Separation of Connected Scenic Figures

    Full text link
    A neural network model, called an FBF network, is proposed for automatic parallel separation of multiple image figures from each other and their backgrounds in noisy grayscale or multi-colored images. The figures can then be processed in parallel by an array of self-organizing Adaptive Resonance Theory (ART) neural networks for automatic target recognition. An FBF network can automatically separate the disconnected but interleaved spirals that Minsky and Papert introduced in their book Perceptrons. The network's design also clarifies why humans cannot rapidly separate interleaved spirals, yet can rapidly detect conjunctions of disparity and color, or of disparity and motion, that distinguish target figures from surrounding distractors. Figure-ground separation is accomplished by iterating operations of a Feature Contour System (FCS) and a Boundary Contour System (BCS) in the order FCS-BCS-FCS, hence the term FBF, that have been derived from an analysis of biological vision. The FCS operations include the use of nonlinear shunting networks to compensate for variable illumination and nonlinear diffusion networks to control filling-in. A key new feature of an FBF network is the use of filling-in for figure-ground separation. The BCS operations include oriented filters joined to competitive and cooperative interactions designed to detect, regularize, and complete boundaries in up to 50 percent noise, while suppressing the noise. A modified CORT-X filter is described which uses both on-cells and off-cells to generate a boundary segmentation from a noisy image.Air Force Office of Scientific Research (90-0175); Army Research Office (DAAL-03-88-K0088); Defense Advanced Research Projects Agency (90-0083); Hughes Research Laboratories (S1-804481-D, S1-903136); American Society for Engineering Educatio

    A Neural Network Model of Auditory Scene Anaysis and Source Segregation

    Full text link
    In environments with multiple sound sources, the auditory system is capable of teasing apart the impinging jumbled signal into different mental objects, or streams, as in its ability to solve the cocktail party problem. A neural network model of auditory scene analysis, called the ARTSTREAM model, is presented that groups different frequency components based on pitch and spatial location cues, and selectively allocates the components to different streams. The grouping is accomplished through a resonance that develops between a given object's pitch, its harmonic spectral components, and (to a lesser extent) its spatial location. Those spectral components that are not reinforced by being rnatched with the top-down prototype read-out by the selected object's pitch representation are suppressed, thereby allowing another stream to capture these components, as in the "old-plus-new heuristic" of Bregman. These resonance and matching mechanisms are specialized versions of Adaptive Resonance Theory, or ART, mechanisms. The model is used to simulate data from psychophysical grouping experiments, such as how a. tone sweeping upwards in frequency creates a bounce percept by grouping with a downward sweeping tone clue to proximity in frequency, even if noise replaces the tones at their intersection point. The model also simulates illusory auditory percepts such as the auditory continuity illusion of a tone continuing through a noise burst even if the tone is not present during the noise, and the scale illusion of Deutsch whereby downward and upward scales presented alternately to the two ears are regrouped based on frequency proximity, leading to a bounce percept. The stream resonances provide the coherence that allows one voice or instrument to be tracked through a multiple source environment.Air Force Office of Scientific Research (F49620-92-J-0225, F49620-92-J-0225); National Science Foundation (IRI-90-00530); Office of Naval Research (N00014-92-J-4015); British Petroleum (89A-1204

    A voice interface for sound generators: adaptive and automatic mapping of gestures to sound

    Get PDF
    Sound generators and synthesis engines expose a large set of parameters, allowing run-time timbre morphing and exploration of sonic space. However, control over these high-dimensional interfaces is constrained by the physical limitations of performers. In this paper we propose the exploitation of vocal gesture as an extension or alternative to traditional physical controllers. The approach uses dynamic aspects of vocal sound to control variations in the timbre of the synthesized sound. The mapping from vocal to synthesis parameters is automatically adapted to information extracted from vocal examples as well as to the relationship between parameters and timbre within the synthesizer. The mapping strategy aims to maximize the breadth of the explorable perceptual sonic space over a set of the synthesizer\u27s real-valued parameters, indirectly driven by the voice-controlled interface

    A Neural Network for Synthesizing the Pitch of an Acoustic Source

    Full text link
    This article describes a neural network model capable of generating a spatial representation of the pitch of an acoustic source. Pitch is one of several auditory percepts used by humans to separate multiple sound sources in the environment from each other. The model provides a neural instantiation of a type of "harmonic sieve". It is capable of quantitatively simulating a large body of psychoacoustical data, including new data on octave shift perception.Air Force Office of Scientific Research (90-0128, 90-0175); Defense Advanced Research Projects Agency (90-0083); National Science Foundation (IRI 90-24877); American Society for Engineering Educatio

    ARSTREAM: A Neural Network Model of Auditory Scene Analysis and Source Segregation

    Full text link
    Multiple sound sources often contain harmonics that overlap and may be degraded by environmental noise. The auditory system is capable of teasing apart these sources into distinct mental objects, or streams. Such an "auditory scene analysis" enables the brain to solve the cocktail party problem. A neural network model of auditory scene analysis, called the AIRSTREAM model, is presented to propose how the brain accomplishes this feat. The model clarifies how the frequency components that correspond to a give acoustic source may be coherently grouped together into distinct streams based on pitch and spatial cues. The model also clarifies how multiple streams may be distinguishes and seperated by the brain. Streams are formed as spectral-pitch resonances that emerge through feedback interactions between frequency-specific spectral representaion of a sound source and its pitch. First, the model transforms a sound into a spatial pattern of frequency-specific activation across a spectral stream layer. The sound has multiple parallel representations at this layer. A sound's spectral representation activates a bottom-up filter that is sensitive to harmonics of the sound's pitch. The filter activates a pitch category which, in turn, activate a top-down expectation that allows one voice or instrument to be tracked through a noisy multiple source environment. Spectral components are suppressed if they do not match harmonics of the top-down expectation that is read-out by the selected pitch, thereby allowing another stream to capture these components, as in the "old-plus-new-heuristic" of Bregman. Multiple simultaneously occuring spectral-pitch resonances can hereby emerge. These resonance and matching mechanisms are specialized versions of Adaptive Resonance Theory, or ART, which clarifies how pitch representations can self-organize durin learning of harmonic bottom-up filters and top-down expectations. The model also clarifies how spatial location cues can help to disambiguate two sources with similar spectral cures. Data are simulated from psychophysical grouping experiments, such as how a tone sweeping upwards in frequency creates a bounce percept by grouping with a downward sweeping tone due to proximity in frequency, even if noise replaces the tones at their interection point. Illusory auditory percepts are also simulated, such as the auditory continuity illusion of a tone continuing through a noise burst even if the tone is not present during the noise, and the scale illusion of Deutsch whereby downward and upward scales presented alternately to the two ears are regrouped based on frequency proximity, leading to a bounce percept. Since related sorts of resonances have been used to quantitatively simulate psychophysical data about speech perception, the model strengthens the hypothesis the ART-like mechanisms are used at multiple levels of the auditory system. Proposals for developing the model to explain more complex streaming data are also provided.Air Force Office of Scientific Research (F49620-01-1-0397, F49620-92-J-0225); Office of Naval Research (N00014-01-1-0624); Advanced Research Projects Agency (N00014-92-J-4015); British Petroleum (89A-1204); National Science Foundation (IRI-90-00530); American Society of Engineering Educatio

    ARSTREAM: A Neural Network Model of Auditory Scene Analysis and Source Segregation

    Full text link
    Multiple sound sources often contain harmonics that overlap and may be degraded by environmental noise. The auditory system is capable of teasing apart these sources into distinct mental objects, or streams. Such an "auditory scene analysis" enables the brain to solve the cocktail party problem. A neural network model of auditory scene analysis, called the AIRSTREAM model, is presented to propose how the brain accomplishes this feat. The model clarifies how the frequency components that correspond to a give acoustic source may be coherently grouped together into distinct streams based on pitch and spatial cues. The model also clarifies how multiple streams may be distinguishes and seperated by the brain. Streams are formed as spectral-pitch resonances that emerge through feedback interactions between frequency-specific spectral representaion of a sound source and its pitch. First, the model transforms a sound into a spatial pattern of frequency-specific activation across a spectral stream layer. The sound has multiple parallel representations at this layer. A sound's spectral representation activates a bottom-up filter that is sensitive to harmonics of the sound's pitch. The filter activates a pitch category which, in turn, activate a top-down expectation that allows one voice or instrument to be tracked through a noisy multiple source environment. Spectral components are suppressed if they do not match harmonics of the top-down expectation that is read-out by the selected pitch, thereby allowing another stream to capture these components, as in the "old-plus-new-heuristic" of Bregman. Multiple simultaneously occuring spectral-pitch resonances can hereby emerge. These resonance and matching mechanisms are specialized versions of Adaptive Resonance Theory, or ART, which clarifies how pitch representations can self-organize durin learning of harmonic bottom-up filters and top-down expectations. The model also clarifies how spatial location cues can help to disambiguate two sources with similar spectral cures. Data are simulated from psychophysical grouping experiments, such as how a tone sweeping upwards in frequency creates a bounce percept by grouping with a downward sweeping tone due to proximity in frequency, even if noise replaces the tones at their interection point. Illusory auditory percepts are also simulated, such as the auditory continuity illusion of a tone continuing through a noise burst even if the tone is not present during the noise, and the scale illusion of Deutsch whereby downward and upward scales presented alternately to the two ears are regrouped based on frequency proximity, leading to a bounce percept. Since related sorts of resonances have been used to quantitatively simulate psychophysical data about speech perception, the model strengthens the hypothesis the ART-like mechanisms are used at multiple levels of the auditory system. Proposals for developing the model to explain more complex streaming data are also provided.Air Force Office of Scientific Research (F49620-01-1-0397, F49620-92-J-0225); Office of Naval Research (N00014-01-1-0624); Advanced Research Projects Agency (N00014-92-J-4015); British Petroleum (89A-1204); National Science Foundation (IRI-90-00530); American Society of Engineering Educatio

    A Spectral Network Model of Pitch Perception

    Full text link
    A model of pitch perception, called the Spatial Pitch Network or SPINET model, is developed and analyzed. The model neurally instantiates ideas front the spectral pitch modeling literature and joins them to basic neural network signal processing designs to simulate a broader range of perceptual pitch data than previous spectral models. The components of the model arc interpreted as peripheral mechanical and neural processing stages, which arc capable of being incorporated into a larger network architecture for separating multiple sound sources in the environment. The core of the new model transforms a spectral representation of an acoustic source into a spatial distribution of pitch strengths. The SPINET model uses a weighted "harmonic sieve" whereby the strength of activation of a given pitch depends upon a weighted sum of narrow regions around the harmonics of the nominal pitch value, and higher harmonics contribute less to a pitch than lower ones. Suitably chosen harmonic weighting functions enable computer simulations of pitch perception data involving mistuned components, shifted harmonics, and various types of continuous spectra including rippled noise. It is shown how the weighting functions produce the dominance region, how they lead to octave shifts of pitch in response to ambiguous stimuli, and how they lead to a pitch region in response to the octave-spaced Shepard tone complexes and Deutsch tritones without the use of attentional mechanisms to limit pitch choices. An on-center off-surround network in the model helps to produce noise suppression, partial masking and edge pitch. Finally, it is shown how peripheral filtering and short term energy measurements produce a model pitch estimate that is sensitive to certain component phase relationships.Air Force Office of Scientific Research (F49620-92-J-0225); American Society for Engineering Educatio

    Instrumentalizing Synthesis Models

    Get PDF
    ABSTRACT An important part of building interactive sound models is designing the interface and control strategy. The multidimensional structure of the gestures natural for a musical or physical interface may have little obvious relationship to the parameters that a sound synthesis algorithm exposes for control. A common situation arises when there is a nonlinear synthesis technique for which a traditional instrumental interface with quasi-independent control of pitch and expression is desired. This paper presents a semi-automatic meta-modeling tool called the Instrumentalizer for embedding arbitrary synthesis algorithms in a control structure that exposes traditional instrument controls for pitch and expression
    corecore